Skip to content

Conversation

@tgs-sc
Copy link
Contributor

@tgs-sc tgs-sc commented Mar 27, 2025

Introduced a way for generating schema for validating input YAML. This can be useful to see the full structure of the input YAML file for different llvm based tools that use existing YAML parser, for example clang-format, clang-tidy e.t.c. This commit also can be useful for yaml-language-server.

@github-actions
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@tgs-sc tgs-sc changed the title [llvm][Support] Add YAMLSchemeGen for producing YAML Schemes from YAM… [llvm][Support] Add YAMLSchemeGen for producing YAML Schemes from YAMLTraits Mar 27, 2025
@llvmbot
Copy link
Member

llvmbot commented Mar 27, 2025

@llvm/pr-subscribers-llvm-support

Author: None (tgs-sc)

Changes

Introduced a way for generating scheme for validating input YAML. This can be useful to see the full structure of the input YAML file for different llvm based tools that use existing YAML parser, for example clang-format, clang-tidy e.t.c. This commit also can be useful for yaml-language-server.


Patch is 34.07 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/133284.diff

8 Files Affected:

  • (added) llvm/include/llvm/Support/YAMLSchemeGen.h (+497)
  • (modified) llvm/include/llvm/Support/YAMLTraits.h (+48-27)
  • (modified) llvm/lib/Support/CMakeLists.txt (+1)
  • (added) llvm/lib/Support/YAMLSchemeGen.cpp (+163)
  • (modified) llvm/lib/Support/YAMLTraits.cpp (+4)
  • (modified) llvm/unittests/ObjectYAML/CMakeLists.txt (+1)
  • (added) llvm/unittests/ObjectYAML/YAMLSchemeGen.cpp (+103)
  • (modified) llvm/utils/gn/secondary/llvm/lib/Support/BUILD.gn (+1)
diff --git a/llvm/include/llvm/Support/YAMLSchemeGen.h b/llvm/include/llvm/Support/YAMLSchemeGen.h
new file mode 100644
index 0000000000000..8100e3f1d175f
--- /dev/null
+++ b/llvm/include/llvm/Support/YAMLSchemeGen.h
@@ -0,0 +1,497 @@
+//===- llvm/Support/YAMLSchemeGen.h -----------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_SUPPORT_YAMLSCHEMEGEN_H
+#define LLVM_SUPPORT_YAMLSCHEMEGEN_H
+
+#include "llvm/Support/YAMLTraits.h"
+
+namespace llvm {
+
+namespace yaml {
+
+class SchemeGen : public IO {
+public:
+  SchemeGen(raw_ostream &RO, void *Ctxt = nullptr, int WrapColumn = 70);
+  ~SchemeGen() override = default;
+
+  IOKind getKind() const override;
+  bool outputting() const override;
+  bool mapTag(StringRef, bool) override;
+  void beginMapping() override;
+  void endMapping() override;
+  bool preflightKey(const char *key, bool, bool, bool &, void *&) override;
+  void postflightKey(void *) override;
+  std::vector<StringRef> keys() override;
+  void beginFlowMapping() override;
+  void endFlowMapping() override;
+  unsigned beginSequence() override;
+  void endSequence() override;
+  bool preflightElement(unsigned, void *&) override;
+  void postflightElement(void *) override;
+  unsigned beginFlowSequence() override;
+  bool preflightFlowElement(unsigned, void *&) override;
+  void postflightFlowElement(void *) override;
+  void endFlowSequence() override;
+  void beginEnumScalar() override;
+  bool matchEnumScalar(const char *, bool) override;
+  bool matchEnumFallback() override;
+  void endEnumScalar() override;
+  bool beginBitSetScalar(bool &) override;
+  bool bitSetMatch(const char *, bool) override;
+  void endBitSetScalar() override;
+  void scalarString(StringRef &, QuotingType) override;
+  void blockScalarString(StringRef &) override;
+  void scalarTag(std::string &) override;
+  NodeKind getNodeKind() override;
+  void setError(const Twine &message) override;
+  bool canElideEmptySequence() override;
+
+  // These are only used by operator<<. They could be private
+  // if that templated operator could be made a friend.
+  void beginDocuments();
+  bool preflightDocument(unsigned);
+  void postflightDocument();
+  void endDocuments();
+
+  // These are needed to yamlize scheme.
+  struct YNode {
+  public:
+    YNode(NodeKind Kind) : Kind(Kind) {}
+
+    NodeKind getNodeKind() const { return Kind; }
+
+  private:
+    NodeKind Kind;
+  };
+
+  class ScalarYNode final : public YNode {
+    StringRef Str;
+    QuotingType QT;
+
+  public:
+    ScalarYNode(StringRef Str, QuotingType QT)
+        : YNode(NodeKind::Scalar), Str(Str), QT(QT) {}
+
+    StringRef getValue() const { return Str; }
+
+    QuotingType getQuotingType() const { return QT; }
+  };
+
+  class MapYNode final : public YNode,
+                         SmallVector<std::pair<StringRef, YNode *>, 8> {
+  public:
+    using BaseVector = SmallVector<std::pair<StringRef, YNode *>, 8>;
+
+    MapYNode() : YNode(NodeKind::Map) {}
+
+    using BaseVector::begin;
+    using BaseVector::emplace_back;
+    using BaseVector::end;
+    using BaseVector::size;
+  };
+
+  class SequenceYNode final : public YNode, SmallVector<YNode *, 8> {
+  public:
+    using BaseVector = SmallVector<YNode *, 8>;
+
+    SequenceYNode() : YNode(NodeKind::Sequence) {}
+
+    using BaseVector::operator[];
+    using BaseVector::begin;
+    using BaseVector::emplace_back;
+    using BaseVector::end;
+    using BaseVector::resize;
+    using BaseVector::size;
+  };
+
+private:
+  template <typename YNodeType, typename... YNodeArgs>
+  YNodeType *createYNode(YNodeArgs &&...Args) {
+    auto UPtr = std::make_unique<YNodeType>(std::forward<YNodeArgs>(Args)...);
+    auto *Ptr = UPtr.get();
+    YNodes.emplace_back(std::move(UPtr));
+    return Ptr;
+  }
+
+public:
+  ScalarYNode *createScalarYNode(StringRef Str) {
+    return createYNode<ScalarYNode>(Str, QuotingType::None);
+  }
+
+  MapYNode *createMapYNode() { return createYNode<MapYNode>(); }
+
+  SequenceYNode *createSequenceYNode() { return createYNode<SequenceYNode>(); }
+
+  std::vector<std::unique_ptr<YNode>> YNodes;
+
+  using StringVector = SmallVector<StringRef, 8>;
+
+  class SchemeNode {
+  public:
+    virtual YNode *yamlize(SchemeGen &Gen) const = 0;
+
+    virtual ~SchemeNode() = default;
+  };
+
+  enum class PropertyKind : uint8_t {
+    Properties,
+    Required,
+    Type,
+    Enum,
+    Items,
+  };
+
+  class Scheme;
+
+  class SchemeProperty : public SchemeNode {
+    PropertyKind Kind;
+    StringRef Name;
+
+  public:
+    SchemeProperty(PropertyKind Kind, StringRef Name)
+        : Kind(Kind), Name(Name) {}
+
+    PropertyKind getKind() const { return Kind; }
+
+    StringRef getName() const { return Name; }
+  };
+
+  class PropertiesProperty final
+      : public SchemeProperty,
+        SmallVector<std::pair<StringRef, Scheme *>, 8> {
+  public:
+    using BaseVector = SmallVector<std::pair<StringRef, Scheme *>, 8>;
+
+    PropertiesProperty()
+        : SchemeProperty(PropertyKind::Properties, "properties") {}
+
+    using BaseVector::begin;
+    using BaseVector::emplace_back;
+    using BaseVector::end;
+    using BaseVector::size;
+
+    YNode *yamlize(SchemeGen &Gen) const override {
+      auto *Map = Gen.createMapYNode();
+      for (auto &&[Key, Value] : *this) {
+        auto *Y = Value->yamlize(Gen);
+        Map->emplace_back(Key, Y);
+      }
+      return Map;
+    }
+  };
+
+  class RequiredProperty final : public SchemeProperty, StringVector {
+  public:
+    using BaseVector = StringVector;
+
+    RequiredProperty() : SchemeProperty(PropertyKind::Required, "required") {}
+
+    using BaseVector::begin;
+    using BaseVector::emplace_back;
+    using BaseVector::end;
+    using BaseVector::size;
+
+    YNode *yamlize(SchemeGen &Gen) const override {
+      if (size() == 1) {
+        auto *Scalar = Gen.createScalarYNode(*begin());
+        return Scalar;
+      }
+      auto *Sequence = Gen.createSequenceYNode();
+      for (auto &&Value : *this) {
+        auto *Scalar = Gen.createScalarYNode(Value);
+        Sequence->emplace_back(Scalar);
+      }
+      return Sequence;
+    }
+  };
+
+  class TypeProperty final : public SchemeProperty {
+    StringRef Value = "any";
+
+  public:
+    TypeProperty() : SchemeProperty(PropertyKind::Type, "type") {}
+
+    StringRef getValue() const { return Value; }
+
+    void setValue(StringRef Val) { Value = Val; }
+
+    YNode *yamlize(SchemeGen &Gen) const override {
+      auto *Scalar = Gen.createScalarYNode(Value);
+      return Scalar;
+    }
+  };
+
+  class EnumProperty final : public SchemeProperty, StringVector {
+  public:
+    using BaseVector = StringVector;
+
+    EnumProperty() : SchemeProperty(PropertyKind::Enum, "enum") {}
+
+    using BaseVector::begin;
+    using BaseVector::emplace_back;
+    using BaseVector::end;
+    using BaseVector::size;
+
+    YNode *yamlize(SchemeGen &Gen) const override {
+      auto *Sequence = Gen.createSequenceYNode();
+      for (auto &&Value : *this) {
+        auto *Scalar = Gen.createScalarYNode(Value);
+        Sequence->emplace_back(Scalar);
+      }
+      return Sequence;
+    }
+  };
+
+  class ItemsProperty final : public SchemeProperty, SmallVector<Scheme *, 8> {
+  public:
+    using BaseVector = SmallVector<Scheme *, 8>;
+
+    ItemsProperty() : SchemeProperty(PropertyKind::Items, "items") {}
+
+    using BaseVector::begin;
+    using BaseVector::emplace_back;
+    using BaseVector::end;
+    using BaseVector::size;
+
+    YNode *yamlize(SchemeGen &Gen) const override {
+      auto *Sequence = Gen.createSequenceYNode();
+      for (auto &&Value : *this) {
+        auto *Y = Value->yamlize(Gen);
+        Sequence->emplace_back(Y);
+      }
+      return Sequence;
+    }
+  };
+
+  class Scheme final : public SchemeNode, SmallVector<SchemeProperty *, 8> {
+  public:
+    using BaseVector = SmallVector<SchemeProperty *, 8>;
+
+    Scheme() = default;
+
+    using BaseVector::begin;
+    using BaseVector::emplace_back;
+    using BaseVector::end;
+    using BaseVector::size;
+
+    template <typename PropertyType> PropertyType *findProperty() const {
+      auto P = PropertyType{};
+      auto Found = std::find_if(begin(), end(), [&](auto &&Prop) {
+        return Prop->getKind() == P.getKind();
+      });
+      return Found != end() ? static_cast<PropertyType *>(*Found) : nullptr;
+    }
+
+    template <typename PropertyType> bool hasProperty() const {
+      return findProperty<PropertyType>() != nullptr;
+    }
+
+    YNode *yamlize(SchemeGen &Gen) const override {
+      auto *Map = Gen.createMapYNode();
+      for (auto &&Value : *this) {
+        auto *Y = Value->yamlize(Gen);
+        auto Name = Value->getName();
+        Map->emplace_back(Name, Y);
+      }
+      return Map;
+    }
+  };
+
+private:
+  std::vector<std::unique_ptr<SchemeNode>> SchemeNodes;
+  SmallVector<Scheme *, 8> Schemes;
+
+  template <typename PropertyType> PropertyType *createProperty() {
+    auto UPtr = std::make_unique<PropertyType>();
+    auto *Ptr = UPtr.get();
+    SchemeNodes.emplace_back(std::move(UPtr));
+    return Ptr;
+  }
+
+public:
+  template <typename PropertyType>
+  PropertyType *getOrCreateProperty(Scheme &S) {
+    auto Found = S.findProperty<PropertyType>();
+    if (!Found) {
+      Found = createProperty<PropertyType>();
+      S.emplace_back(Found);
+    }
+    return Found;
+  }
+
+  Scheme *createScheme() {
+    auto UPtr = std::make_unique<Scheme>();
+    auto *Ptr = UPtr.get();
+    SchemeNodes.emplace_back(std::move(UPtr));
+    return Ptr;
+  }
+
+  Scheme *getTopScheme() const {
+    return Schemes.empty() ? nullptr : Schemes.back();
+  }
+
+private:
+  Output O;
+  SchemeNode *Root = nullptr;
+};
+
+template <> struct PolymorphicTraits<SchemeGen::YNode> {
+  static NodeKind getKind(const SchemeGen::YNode &N) { return N.getNodeKind(); }
+
+  static SchemeGen::ScalarYNode &getAsScalar(SchemeGen::YNode &N) {
+    return (SchemeGen::ScalarYNode &)N;
+  }
+
+  static SchemeGen::MapYNode &getAsMap(SchemeGen::YNode &N) {
+    return (SchemeGen::MapYNode &)N;
+  }
+
+  static SchemeGen::SequenceYNode &getAsSequence(SchemeGen::YNode &N) {
+    return (SchemeGen::SequenceYNode &)N;
+  }
+};
+
+template <> struct ScalarTraits<SchemeGen::ScalarYNode> {
+  static void output(const SchemeGen::ScalarYNode &N, void *Ctx,
+                     llvm::raw_ostream &Out) {
+    Out << N.getValue();
+  }
+
+  static StringRef input(StringRef Scalar, void *Ctx,
+                         SchemeGen::ScalarYNode &N) {
+    return {};
+  }
+
+  static QuotingType mustQuote(StringRef) { return QuotingType::None; }
+};
+
+template <> struct MappingTraits<SchemeGen::MapYNode> {
+  static void mapping(IO &io, SchemeGen::MapYNode &N) {
+    for (auto &&[Key, Value] : N) {
+      io.mapRequired(Key.data(), *Value);
+    }
+  }
+};
+
+template <> struct SequenceTraits<SchemeGen::SequenceYNode> {
+  static size_t size(IO &io, SchemeGen::SequenceYNode &N) { return N.size(); }
+
+  static SchemeGen::YNode &element(IO &, SchemeGen::SequenceYNode &N,
+                                   size_t Idx) {
+    if (Idx >= N.size())
+      N.resize(Idx + 1);
+    return *(N[Idx]);
+  }
+};
+
+// Define non-member operator<< so that Output can stream out document list.
+template <typename T>
+inline std::enable_if_t<has_DocumentListTraits<T>::value, SchemeGen &>
+operator<<(SchemeGen &Gen, T &DocList) {
+  EmptyContext Ctx;
+  Gen.beginDocuments();
+  const size_t count = DocumentListTraits<T>::size(Gen, DocList);
+  for (size_t i = 0; i < count; ++i) {
+    if (Gen.preflightDocument(i)) {
+      yamlize(Gen, DocumentListTraits<T>::element(Gen, DocList, i), true, Ctx);
+      Gen.postflightDocument();
+    }
+  }
+  Gen.endDocuments();
+  return Gen;
+}
+
+// Define non-member operator<< so that Output can stream out a map.
+template <typename T>
+inline std::enable_if_t<has_MappingTraits<T, EmptyContext>::value, SchemeGen &>
+operator<<(SchemeGen &Gen, T &Map) {
+  EmptyContext Ctx;
+  Gen.beginDocuments();
+  if (Gen.preflightDocument(0)) {
+    yamlize(Gen, Map, true, Ctx);
+    Gen.postflightDocument();
+  }
+  Gen.endDocuments();
+  return Gen;
+}
+
+// Define non-member operator<< so that Output can stream out a sequence.
+template <typename T>
+inline std::enable_if_t<has_SequenceTraits<T>::value, SchemeGen &>
+operator<<(SchemeGen &Gen, T &Seq) {
+  EmptyContext Ctx;
+  Gen.beginDocuments();
+  if (Gen.preflightDocument(0)) {
+    yamlize(Gen, Seq, true, Ctx);
+    Gen.postflightDocument();
+  }
+  Gen.endDocuments();
+  return Gen;
+}
+
+// Define non-member operator<< so that Output can stream out a block scalar.
+template <typename T>
+inline std::enable_if_t<has_BlockScalarTraits<T>::value, SchemeGen &>
+operator<<(SchemeGen &Gen, T &Val) {
+  EmptyContext Ctx;
+  Gen.beginDocuments();
+  if (Gen.preflightDocument(0)) {
+    yamlize(Gen, Val, true, Ctx);
+    Gen.postflightDocument();
+  }
+  Gen.endDocuments();
+  return Gen;
+}
+
+// Define non-member operator<< so that Output can stream out a string map.
+template <typename T>
+inline std::enable_if_t<has_CustomMappingTraits<T>::value, SchemeGen &>
+operator<<(SchemeGen &Gen, T &Val) {
+  EmptyContext Ctx;
+  Gen.beginDocuments();
+  if (Gen.preflightDocument(0)) {
+    yamlize(Gen, Val, true, Ctx);
+    Gen.postflightDocument();
+  }
+  Gen.endDocuments();
+  return Gen;
+}
+
+// Define non-member operator<< so that Output can stream out a polymorphic
+// type.
+template <typename T>
+inline std::enable_if_t<has_PolymorphicTraits<T>::value, SchemeGen &>
+operator<<(SchemeGen &Gen, T &Val) {
+  EmptyContext Ctx;
+  Gen.beginDocuments();
+  if (Gen.preflightDocument(0)) {
+    // FIXME: The parser does not support explicit documents terminated with a
+    // plain scalar; the end-marker is included as part of the scalar token.
+    assert(PolymorphicTraits<T>::getKind(Val) != NodeKind::Scalar &&
+           "plain scalar documents are not supported");
+    yamlize(Gen, Val, true, Ctx);
+    Gen.postflightDocument();
+  }
+  Gen.endDocuments();
+  return Gen;
+}
+
+// Provide better error message about types missing a trait specialization
+template <typename T>
+inline std::enable_if_t<missingTraits<T, EmptyContext>::value, SchemeGen &>
+operator<<(SchemeGen &Gen, T &seq) {
+  char missing_yaml_trait_for_type[sizeof(MissingTrait<T>)];
+  return Gen;
+}
+
+} // namespace yaml
+
+} // namespace llvm
+
+#endif // LLVM_SUPPORT_YAMLSCHEMEGEN_H
diff --git a/llvm/include/llvm/Support/YAMLTraits.h b/llvm/include/llvm/Support/YAMLTraits.h
index e707a445012b5..61f90fbdc5ac8 100644
--- a/llvm/include/llvm/Support/YAMLTraits.h
+++ b/llvm/include/llvm/Support/YAMLTraits.h
@@ -774,12 +774,19 @@ struct unvalidatedMappingTraits
           bool, has_MappingTraits<T, Context>::value &&
                     !has_MappingValidateTraits<T, Context>::value> {};
 
+enum class IOKind : uint8_t {
+  Outputting,
+  Inputting,
+  SchemeGenering,
+};
+
 // Base class for Input and Output.
 class IO {
 public:
   IO(void *Ctxt = nullptr);
   virtual ~IO();
 
+  virtual IOKind getKind() const = 0;
   virtual bool outputting() const = 0;
 
   virtual unsigned beginSequence() = 0;
@@ -824,15 +831,17 @@ class IO {
 
   template <typename T>
   void enumCase(T &Val, const char* Str, const T ConstVal) {
-    if ( matchEnumScalar(Str, outputting() && Val == ConstVal) ) {
+    if (matchEnumScalar(Str,
+                        getKind() == IOKind::Outputting && Val == ConstVal)) {
       Val = ConstVal;
     }
   }
 
   // allow anonymous enum values to be used with LLVM_YAML_STRONG_TYPEDEF
   template <typename T>
-  void enumCase(T &Val, const char* Str, const uint32_t ConstVal) {
-    if ( matchEnumScalar(Str, outputting() && Val == static_cast<T>(ConstVal)) ) {
+  void enumCase(T &Val, const char *Str, const uint32_t ConstVal) {
+    if (matchEnumScalar(Str, getKind() == IOKind::Outputting &&
+                                 Val == static_cast<T>(ConstVal))) {
       Val = ConstVal;
     }
   }
@@ -849,8 +858,9 @@ class IO {
   }
 
   template <typename T>
-  void bitSetCase(T &Val, const char* Str, const T ConstVal) {
-    if ( bitSetMatch(Str, outputting() && (Val & ConstVal) == ConstVal) ) {
+  void bitSetCase(T &Val, const char *Str, const T ConstVal) {
+    if (bitSetMatch(Str, getKind() == IOKind::Outputting &&
+                             (Val & ConstVal) == ConstVal)) {
       Val = static_cast<T>(Val | ConstVal);
     }
   }
@@ -858,21 +868,24 @@ class IO {
   // allow anonymous enum values to be used with LLVM_YAML_STRONG_TYPEDEF
   template <typename T>
   void bitSetCase(T &Val, const char* Str, const uint32_t ConstVal) {
-    if ( bitSetMatch(Str, outputting() && (Val & ConstVal) == ConstVal) ) {
+    if (bitSetMatch(Str, getKind() == IOKind::Outputting &&
+                             (Val & ConstVal) == ConstVal)) {
       Val = static_cast<T>(Val | ConstVal);
     }
   }
 
   template <typename T>
   void maskedBitSetCase(T &Val, const char *Str, T ConstVal, T Mask) {
-    if (bitSetMatch(Str, outputting() && (Val & Mask) == ConstVal))
+    if (bitSetMatch(Str, getKind() == IOKind::Outputting &&
+                             (Val & Mask) == ConstVal))
       Val = Val | ConstVal;
   }
 
   template <typename T>
   void maskedBitSetCase(T &Val, const char *Str, uint32_t ConstVal,
                         uint32_t Mask) {
-    if (bitSetMatch(Str, outputting() && (Val & Mask) == ConstVal))
+    if (bitSetMatch(Str, getKind() == IOKind::Outputting &&
+                             (Val & Mask) == ConstVal))
       Val = Val | ConstVal;
   }
 
@@ -942,7 +955,8 @@ class IO {
                              bool Required, Context &Ctx) {
     void *SaveInfo;
     bool UseDefault;
-    const bool sameAsDefault = outputting() && Val == DefaultValue;
+    const bool sameAsDefault =
+        (getKind() == IOKind::Outputting) && Val == DefaultValue;
     if ( this->preflightKey(Key, Required, sameAsDefault, UseDefault,
                                                                   SaveInfo) ) {
       yamlize(*this, Val, Required, Ctx);
@@ -1004,7 +1018,7 @@ yamlize(IO &io, T &Val, bool, EmptyContext &Ctx) {
 template <typename T>
 std::enable_if_t<has_ScalarTraits<T>::value, void> yamlize(IO &io, T &Val, bool,
                                                            EmptyContext &Ctx) {
-  if ( io.outputting() ) {
+  if (io.getKind() != IOKind::Inputting) {
     SmallString<128> Storage;
     raw_svector_ostream Buffer(Storage);
     ScalarTraits<T>::output(Val, io.getContext(), Buffer);
@@ -1024,7 +1038,7 @@ std::enable_if_t<has_ScalarTraits<T>::value, void> yamlize(IO &io, T &Val, bool,
 template <typename T>
 std::enable_if_t<has_BlockScalarTraits<T>::value, void>
 yamlize(IO &YamlIO, T &Val, bool, EmptyContext &Ctx) {
-  if (YamlIO.outputting()) {
+  if (YamlIO.getKind() != IOKind::Inputting) {
     std::string Storage;
     raw_string_ostream Buffer(Storage);
     BlockScalarTraits<T>::output(Val, YamlIO.getContext(), Buffer);
@@ -1043,7 +1057,7 @@ yamlize(IO &YamlIO, T &Val, bool, EmptyContext &Ctx) {
 template <typename T>
 std::enable_if_t<has_TaggedScalarTraits<T>::value, void>
 yamlize(IO &io, T &Val, bool, EmptyContext &Ctx) {
-  if (io.outputting()) {
+  if (io.getKind() != IOKind::Inputting) {
     std::string ScalarStorage, TagStorage;
     raw_string_ostream ScalarBuffer(ScalarStorage), TagBuffer(TagStorage);
     TaggedScalarTraits<T>::output(Val, io.getContext(), ScalarBuffer,
@@ -1085,7 +1099,7 @@ yamlize(IO &io, T &Val, bool, Context &Ctx) {
     io.beginFlowMapping();
   else
     io.beginMapping();
-  if (io.outputting()) {
+  if (io.getKind() == IOKind::Outputting) {
     std::string Err = detail::doValidate(io, Val, Ctx);
     if (!Err.empty()) {
       errs() << Err << "\n";
@@ -1093,7 +1107,7 @@ yamlize(IO &io, T &Val, bool, Context &Ctx) {
     }
   }
   detail::doMapping(io, Val, Ctx);
-  if (!io.outputting()) {
+  if (io.getKind() == IOKind::Inputting) {
     std::string Err = detail::doValidate(io, Val, Ctx);
     if (!Err.empty())
       io.setError(Err);
@@ -1113,7 +1127,7 @@ yamlizeMappingEnumInput(IO &io, T &Val...
[truncated]

@tgs-sc tgs-sc force-pushed the users/tgs-sc/auto-generate-yaml-scheme branch from 4b9e8a8 to 2096a43 Compare March 27, 2025 21:34
@tgs-sc
Copy link
Contributor Author

tgs-sc commented Mar 28, 2025

@jh7370, I apologize in advance, but I saw that you recently looked at a patch to add new functionality (new virtual error method) to this part of llvm for working with the yaml format. Could you please look at this patch, because it also adds some new functionality without breaking backward compatibility.

@jh7370
Copy link
Collaborator

jh7370 commented Mar 31, 2025

@jh7370, I apologize in advance, but I saw that you recently looked at a patch to add new functionality (new virtual error method) to this part of llvm for working with the yaml format. Could you please look at this patch, because it also adds some new functionality without breaking backward compatibility.

Hi @tgs-sc, I've added this to my backlog. Please note it may take me a while to review - I've just come back from 2 weeks off work and have a lot to catch up on, plus some upcoming deadlines.

@tgs-sc tgs-sc force-pushed the users/tgs-sc/auto-generate-yaml-scheme branch from 2096a43 to efe6840 Compare March 31, 2025 11:30
Copy link
Collaborator

@jh7370 jh7370 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The term you're looking for is "Schema" not "Scheme".

Realistically, this change appears to be adding too much in one go, so I'm not going to be able to meaningfully review it. Could it be broken up into smaller pieces?

I'm also likely to still be overwhelmed for the next month, with very limited review time during this time.

Have you solicited any feedback for who would want this functionality with an RFC on the LLVM Discourse or similar?

enum class IOKind : uint8_t {
Outputting,
Inputting,
SchemeGenering,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
SchemeGenering,
GeneratingSchema,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed

void enumCase(T &Val, const char* Str, const T ConstVal) {
if ( matchEnumScalar(Str, outputting() && Val == ConstVal) ) {
if (matchEnumScalar(Str,
getKind() == IOKind::Outputting && Val == ConstVal)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this and similar code need to change rather than continue to use outputting()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is because generate schema should behave like Outputting in some cases and like Inputting in others. And in such a case I had to explicitly specify these places, implying that !IOKind::Inputting == (IOKind::Outputting || IOKind::GeneratingSchema). I also decided to leave virtual method outputting() not to break backward compatibility as it is potentially can be called in IO class.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also decided to leave virtual method outputting() not to break backward compatibility as it is potentially can be called in IO class.

I don't agree that this maintains backward compatibility, at least not in a meaningful way. The fact that IOKind is needed for GenerateSchema itself implies that uses of IO exist which are broken when relying on just outputting(), right?

In looking over the code (at least the latest version) I worry that the complexity is growing to the point that the current code sharing approach (a base class with conditions) is not the best approach.

@tgs-sc tgs-sc force-pushed the users/tgs-sc/auto-generate-yaml-scheme branch from efe6840 to 92222dc Compare April 8, 2025 15:02
@tgs-sc
Copy link
Contributor Author

tgs-sc commented Apr 8, 2025

Hi @jh7370!

Realistically, this change appears to be adding too much in one go, so I'm not going to be able to meaningfully review it.
Could it be broken up into smaller pieces?

I think that I can split this patch into 2 commits: added new Derived class GenerateSchema (as this is something monolite) and unit tests for it, but it seems to me that this way you can see better how you can use the new functionality.

Have you solicited any feedback for who would want this functionality with an RFC on the LLVM Discourse or similar?

Currently there is no such topic, but I can make it.

@jh7370
Copy link
Collaborator

jh7370 commented Apr 14, 2025

Hi @jh7370! I have created a suggestion for adding this functionality https://discourse.llvm.org/t/rfc-support-yamlgenerateschema-for-producing-yaml-schemas/85806/2, but unfortunately no one has responded yet. Probably you know someone who has worked on this part of llvm and who can give some comments?

I suspect your problem is that you've posted it in the wrong category. "Project Infrastructure" is for stuff that supports how the LLVM project works, like build bots, GitHub usage and so on. You probably want to post your RFC in the "LLVM Project" category, where it'll attract more attention. You should also look at the people who work in the areas that you envisage this functionality benefiting and try CC'ing them in the thread, as it is these people you should try to get interested in.

@github-actions
Copy link

github-actions bot commented Apr 14, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@tgs-sc tgs-sc force-pushed the users/tgs-sc/auto-generate-yaml-scheme branch from 92222dc to 4d11015 Compare April 14, 2025 11:43
@tgs-sc
Copy link
Contributor Author

tgs-sc commented Apr 16, 2025

@jholewinski, @silvasean, I apologize for disturbing, but I have seen that you had already participated in reviewing this part of llvm. May be this patch will be interesting for you so take a look please!

@tgs-sc
Copy link
Contributor Author

tgs-sc commented Apr 23, 2025

@hyp, sorry to bother you but maybe you can take a look at this patch?

@jh7370
Copy link
Collaborator

jh7370 commented Apr 24, 2025

Hi @tgs-sc, as you wrote an RFC to accompany this PR, I think it's more important you seek some input on that RFC there rather than here on the PR. In particular, I'd reach out (on the RFC on Discord) to the maintainers and contributors to the tools this PR is aimed to help with, rather than the people who occasionally submit patches to areas of yaml2obj/ObjectYAML.

@tgs-sc tgs-sc changed the title [llvm][Support] Add YAMLSchemeGen for producing YAML Schemes from YAMLTraits [llvm][Support] Add YAMLGenerateSchema for producing YAML Schemes from YAMLTraits Jul 3, 2025
@tgs-sc tgs-sc force-pushed the users/tgs-sc/auto-generate-yaml-scheme branch from 4d11015 to 1c31d85 Compare August 19, 2025 11:36
@tgs-sc tgs-sc force-pushed the users/tgs-sc/auto-generate-yaml-scheme branch from 1c31d85 to 1a5e0eb Compare September 16, 2025 12:51
@tgs-sc tgs-sc force-pushed the users/tgs-sc/auto-generate-yaml-scheme branch from 1a5e0eb to 7a22c9c Compare October 23, 2025 14:27
@tgs-sc tgs-sc changed the title [llvm][Support] Add YAMLGenerateSchema for producing YAML Schemes from YAMLTraits [llvm][Support] Add YAMLGenerateSchema for producing YAML schemas Oct 23, 2025
@tgs-sc tgs-sc force-pushed the users/tgs-sc/auto-generate-yaml-scheme branch from 7a22c9c to cc399e1 Compare October 23, 2025 14:39
@@ -0,0 +1,400 @@
//===- llvm/Support/YAMLGenerateSchema.h ------------------------*- C++ -*-===//
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't Emacs support obsolete these days? At least Clang-Tidy headers were cleaned.

Copy link
Contributor Author

@tgs-sc tgs-sc Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, actually it was in YAMLTraits.h, and as I copied this part there, I left it in the same way. Addressed.

void enumCase(T &Val, StringRef Str, const uint32_t ConstVal) {
if (matchEnumScalar(Str, outputting() && Val == static_cast<T>(ConstVal))) {
if (matchEnumScalar(Str, getKind() == IOKind::Outputting &&
Val == static_cast<T>(ConstVal))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be good idea to run Clang-Format. Same in other similar places.

Copy link
Contributor Author

@tgs-sc tgs-sc Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately I can not run clang-tidy on the whole file because it is randomly formatted for now (as it is very ancient). But I can run clang-format on the whole file and ask you to merge these changes in separate PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about clang-format-diff.py?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran clang-format-diff.py, but it didn't change this part (changed some other places)


struct Animal {
std::string Name;
std::optional<int> Age;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please include optional.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed

@tgs-sc tgs-sc force-pushed the users/tgs-sc/auto-generate-yaml-scheme branch from cc399e1 to 374f8ca Compare October 27, 2025 20:04
@@ -0,0 +1,400 @@
//===- YAMLGenerateSchema.h -------------------------------------*- C++ -*-===//
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//===- YAMLGenerateSchema.h -------------------------------------*- C++ -*-===//
//===----------------------------------------------------------------------===//

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you want to remove the name of this header here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is Emacs artifacts. As I mentioned before, they were cleared at least in Clang-Tidy code recently.

@tgs-sc tgs-sc force-pushed the users/tgs-sc/auto-generate-yaml-scheme branch from 374f8ca to aeabbae Compare October 28, 2025 00:07
Introduced a way for generating schema for validating input YAML. This can be
useful to see the full structure of the input YAML file for different llvm
based tools that use existing YAML parser, for example clang-format,
clang-tidy e.t.c. This commit also can be useful for yaml-language-server.
@tgs-sc tgs-sc force-pushed the users/tgs-sc/auto-generate-yaml-scheme branch from aeabbae to f5c71f4 Compare October 28, 2025 00:47
@tgs-sc
Copy link
Contributor Author

tgs-sc commented Oct 30, 2025

@EugeneZelenko, hi! Sorry to bother you, but maybe you can call your friend so that they would also take a look at this patch? Or do you like everything in general?

@EugeneZelenko
Copy link
Contributor

@EugeneZelenko, hi! Sorry to bother you, but maybe you can call your friend so that they would also take a look at this patch? Or do you like everything in general?

You could look on files history to find developers active in this area.

1 similar comment
@EugeneZelenko
Copy link
Contributor

@EugeneZelenko, hi! Sorry to bother you, but maybe you can call your friend so that they would also take a look at this patch? Or do you like everything in general?

You could look on files history to find developers active in this area.

@tgs-sc
Copy link
Contributor Author

tgs-sc commented Oct 30, 2025

You could look on files history to find developers active in this area.

Well, @jh7370 was active developer in this area (as you can see, he has already left some comments here in the beginning of this PR), but after that he told me:

Have you solicited any feedback for who would want this functionality with an RFC on the LLVM Discourse or similar?

You should also look at the people who work in the areas that you envisage this functionality benefiting and try CC'ing them in the thread, as it is these people you should try to get interested in.

So, he asked for feedback from people that commit to the components this patch may be useful for.

Also, the last patch from the person who created this initial support (Nick Kledzik [email protected]) was on 2016.

@tgs-sc
Copy link
Contributor Author

tgs-sc commented Nov 5, 2025

@EugeneZelenko, hi! Polite ping. Can you please tell me if this patch already looks fine for you (and probably you can approve it) either you expect some more changes?

@EugeneZelenko
Copy link
Contributor

@EugeneZelenko, hi! Polite ping. Can you please tell me if this patch already looks fine for you (and probably you can approve it) either you expect some more changes?

You need to ask llvm:support developers for that. Sorry, I could not suggest you better method to find reviewers than looking into changed files history.

@tgs-sc
Copy link
Contributor Author

tgs-sc commented Nov 5, 2025

@slinder1, Hi! Sorry to bother you but I have seen that you recently reviewed some patched in this area. Can you please look at this one?

@slinder1
Copy link
Contributor

@slinder1, Hi! Sorry to bother you but I have seen that you recently reviewed some patched in this area. Can you please look at this one?

I am a bit behind on some other reviews, but this is on the queue!

@tgs-sc
Copy link
Contributor Author

tgs-sc commented Nov 18, 2025

@slinder1, hi! Polite ping since a week has passed.

@slinder1 slinder1 self-requested a review November 26, 2025 18:13
Copy link
Contributor

@slinder1 slinder1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tgs-sc Did you generate any of this code using an LLM or consult an LLM to make some of the design choices?

This is not an accusation nor do I mean to disparage, I am just seeing a unique mix of things which (to me) imply both high-level understanding of a complicated (in terms of language features used/abused) part of the codebase as well as fundamental misuse of some C++ facilities.

I am including the comments I had already drafted by the time I decided to stop and ask, but I don't want to invest more effort in the review until I understand better how I can approach it in a way that benefits you, me, and the llvm-project.


namespace yaml {

class GenerateSchema : public IO {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: can this also be final?

void enumCase(T &Val, const char* Str, const T ConstVal) {
if ( matchEnumScalar(Str, outputting() && Val == ConstVal) ) {
if (matchEnumScalar(Str,
getKind() == IOKind::Outputting && Val == ConstVal)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also decided to leave virtual method outputting() not to break backward compatibility as it is potentially can be called in IO class.

I don't agree that this maintains backward compatibility, at least not in a meaningful way. The fact that IOKind is needed for GenerateSchema itself implies that uses of IO exist which are broken when relying on just outputting(), right?

In looking over the code (at least the latest version) I worry that the complexity is growing to the point that the current code sharing approach (a base class with conditions) is not the best approach.

}

bool GenerateSchema::bitSetMatch(StringRef Val, bool Arg) {
return matchEnumScalar(Val, Arg);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every call to bitSetMatch which dynamically reaches GenerateSchema::bitSetMatch will always pass Match==false, so the use of Arg is a bit misleading. this can just be matchEnumScalar(Val, false).

};

private:
std::vector<std::unique_ptr<SchemaNode>> SchemaNodes;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular reason to represent these as a vector of smart pointers?

The other IO classes allocate nodes using BumpPtrAllocators, which makes sense as the lifetime of the nodes is naturally constrained to the containing IO class.

}

json::Value GenerateSchema::UserDefinedProperty::toJSON() const {
return Value->toJSON();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole class hierarchy under SchemaNode seems to add more complexity than it is worth. Do you have use-cases where it is more than a 1:1 mapping onto json::Value?

It adds maintenance (in terms of SLOC), and has a runtime cost (the allocation and pointer-chasing in the current implementation, as well as virtual dispatch for toJSON).

}

json::Value GenerateSchema::EnumProperty::toJSON() const {
json::Array JSONArray;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bodies of these toJSON overrides contain a lot of copy-pasted bodies. For example there are 3 which are just:

  json::Array JSONArray;
  for (StringRef Value : *this) {
    json::Value JSONValue(Value.data());
    JSONArray.emplace_back(std::move(JSONValue));
  }
  return JSONArray;

If you're already paying for virtual dispatch then why not have virtual base classes for the ~3 actual behaviors here, rather than repeat yourself in every leaf class's method?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants